Local nearest neighbour classification with applications to semi-supervised learning

نویسندگان

Timothy I. Cannings

Thomas B. Berrett

Richard J. Samworth

چکیده

We derive a new asymptotic expansion for the global excess risk of a local k-nearest neighbour classifier, where the choice of k may depend upon the test point. This expansion elucidates conditions under which the dominant contribution to the excess risk comes from the locus of points at which each class label is equally likely to occur, but we also show that if these conditions are not satisfied, the dominant contribution may arise from the tails of the marginal distribution of the features. Moreover, we prove that, provided the d-dimensional marginal distribution of the features has a finite ρth moment for some ρ > 4 (as well as other regularity conditions), a local choice of k can yield a rate of convergence of the excess risk of O(n−4/(d+4)), where n is the sample size, whereas for the standard k-nearest neighbour classifier, our theory would require d ≥ 5 and ρ > 4d/(d − 4) finite moments to achieve this rate. Our results motivate a new k-nearest neighbour classifier for semi-supervised learning problems, where the unlabelled data are used to obtain an estimate of the marginal feature density, and fewer neighbours are used for classification when this density estimate is small. The potential improvements over the standard k-nearest neighbour classifier are illustrated both through our theory and via a simulation study.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scene Classification Via pLSA

Given a set of images of scenes containing multiple object categories (e.g. grass, roads, buildings) our objective is to discover these objects in each image in an unsupervised manner, and to use this object distribution to perform scene classification. We achieve this discovery using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature, here ap...

متن کامل

Enhancing the Performance of Semi-Supervised Classification Algorithms with Bridging

Traditional supervised classification algorithms require a large number of labelled examples to perform accurately. Semi-supervised classification algorithms attempt to overcome this major limitation by also using unlabelled examples. Unlabelled examples have also been used to improve nearest neighbour text classification in a method called bridging. In this paper, we propose the use of bridgin...

متن کامل

Nearest Neighbour Classification with Background Knowledge Extended to Semi-supervised Learning

Semi supervised methods involve converting unlabelled data into high quality labelled data that can be used to improve the performance of conventional supervised methods that had previously been given a small training set. Unlabelled data has also been shown to be helpful in a supervised setting called ‘bridging’ where unlabelled data have been used to help relate labelled instances to those th...

متن کامل

Semi-Supervised Self-Organizing Feature Map for Gene Classification

In this thesis, a study on gene expression data analysis is done using some supervised, unsupervised and semi-supervised approaches. The task of class prediction for six gene expression datasets (namely, Brain Tumor, Colon Cancer, Leukemia, Lymphoma and SRBCT) has been carried out. Here, a one-dimensional self-organizing feature maps (SOFM) in a semi-supervised learning framework is developed f...

متن کامل

Anomaly Detection

This chapter presents an extension of conformal prediction for anomaly detection applications. It includes the presentation and discussion of the Conformal Anomaly Detector (CAD) and the computationally more efficient Inductive Conformal Anomaly Detector (ICAD), which are general algorithms for unsupervised or semi-supervised and offline or online anomaly detection. One of the key properties of...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1704.00642 شماره

صفحات -

تاریخ انتشار 2017

Local nearest neighbour classification with applications to semi-supervised learning

نویسندگان

چکیده

منابع مشابه

Scene Classification Via pLSA

Enhancing the Performance of Semi-Supervised Classification Algorithms with Bridging

Nearest Neighbour Classification with Background Knowledge Extended to Semi-supervised Learning

Semi-Supervised Self-Organizing Feature Map for Gene Classification

Anomaly Detection

عنوان ژورنال:

اشتراک گذاری